data management
Can AI autonomously build, operate, and use the entire data stack?
Agarwal, Arvind, Amini, Lisa, Mehta, Sameep, Samulowitz, Horst, Srinivas, Kavitha
Enterprise data management is a monumental task. It spans data architecture and systems, integration, quality, governance, and continuous improvement. While AI assistants can help specific persona, such as data engineers and stewards, to navigate and configure the data stack, they fall far short of full automation. However, as AI becomes increasingly capable of tackling tasks that have previously resisted automation due to inherent complexities, we believe there is an imminent opportunity to target fully autonomous data estates. Currently, AI is used in different parts of the data stack, but in this paper, we argue for a paradigm shift from the use of AI in independent data component operations towards a more holistic and autonomous handling of the entire data lifecycle. Towards that end, we explore how each stage of the modern data stack can be autonomously managed by intelligent agents to build self-sufficient systems that can be used not only by human end-users, but also by AI itself. We begin by describing the mounting forces and opportunities that demand this paradigm shift, examine how agents can streamline the data lifecycle, and highlight open questions and areas where additional research is needed. We hope this work will inspire lively debate, stimulate further research, motivate collaborative approaches, and facilitate a more autonomous future for data systems.
- Europe > Austria > Vienna (0.14)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Banking & Finance > Trading (0.93)
- Government (0.68)
Aixel: A Unified, Adaptive and Extensible System for AI-powered Data Analysis
Zhang, Meihui, Wang, Liming, Zhang, Chi, Luo, Zhaojing
A growing trend in modern data analysis is the integration of data management with learning, guided by accuracy, latency, and cost requirements. In practice, applications draw data of different formats from many sources. In the meanwhile, the objectives and budgets change over time. Existing systems handle these applications across databases, analysis libraries, and tuning services. Such fragmentation leads to complex user interaction, limited adaptability, suboptimal performance, and poor extensibility across components. To address these challenges, we present Aixel, a unified, adaptive, and extensible system for AI-powered data analysis. The system organizes work across four layers: application, task, model, and data. The task layer provides a declarative interface to capture user intent, which is parsed into an executable operator plan. An optimizer compiles and schedules this plan to meet specified goals in accuracy, latency, and cost. The task layer coordinates the execution of data and model operators, with built-in support for reuse and caching to improve efficiency. The model layer offers versioned storage for index, metadata, tensors, and model artifacts. It supports adaptive construction, task-aligned drift detection, and safe updates that reuse shared components. The data layer provides unified data management capabilities, including indexing, constraint-aware discovery, task-aligned selection, and comprehensive feature management. With the above designed layers, Aixel delivers a user friendly, adaptive, efficient, and extensible system.
- Asia > China > Beijing > Beijing (0.05)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Asia > Middle East > Jordan (0.04)
"We are not Future-ready": Understanding AI Privacy Risks and Existing Mitigation Strategies from the Perspective of AI Developers in Europe
Klymenko, Alexandra, Meisenbacher, Stephen, Kelley, Patrick Gage, Peddinti, Sai Teja, Thomas, Kurt, Matthes, Florian
The proliferation of AI has sparked privacy concerns related to training data, model interfaces, downstream applications, and more. We interviewed 25 AI developers based in Europe to understand which privacy threats they believe pose the greatest risk to users, developers, and businesses and what protective strategies, if any, would help to mitigate them. We find that there is little consensus among AI developers on the relative ranking of privacy risks. These differences stem from salient reasoning patterns that often relate to human rather than purely technical factors. Furthermore, while AI developers are aware of proposed mitigation strategies for addressing these risks, they reported minimal real-world adoption. Our findings highlight both gaps and opportunities for empowering AI developers to better address privacy risks in AI.
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Europe > Ukraine (0.04)
- Europe > Sweden (0.04)
- (14 more...)
- Research Report > New Finding (1.00)
- Questionnaire & Opinion Survey (1.00)
- Personal > Interview (1.00)
- Overview (1.00)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Government (1.00)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Museums have tons of data, and AI could make it more accessible but standardizing and organizing it across fields won't be easy
Ice cores in freezers, dinosaurs on display, fish in jars, birds in boxes, human remains and ancient artifacts from long gone civilizations that few people ever see – museum collections are filled with all this and more. These collections are treasure troves that recount the planet's natural and human history, and they help scientists in a variety of different fields such as geology, paleontology, anthropology and more. What you see on a trip to a museum is only a sliver of the wonders held in their collection. Museums generally want to make the contents of their collections available for teachers and researchers, either physically or digitally. However, each collection's staff has its own way of organizing data, so navigating these collections can prove challenging.
- Africa > Middle East > Egypt > Nile Delta (0.40)
- North America > United States > Tennessee (0.06)
Deep Learning and Machine Learning, Advancing Big Data Analytics and Management: Unveiling AI's Potential Through Tools, Techniques, and Applications
Feng, Pohsun, Bi, Ziqian, Wen, Yizhu, Pan, Xuanhe, Peng, Benji, Liu, Ming, Xu, Jiawei, Chen, Keyu, Liu, Junyu, Yin, Caitlyn Heqi, Zhang, Sen, Wang, Jinlang, Niu, Qian, Li, Ming, Wang, Tianyang
Artificial intelligence (AI), machine learning, and deep learning have become transformative forces in big data analytics and management, enabling groundbreaking advancements across diverse industries. This article delves into the foundational concepts and cutting-edge developments in these fields, with a particular focus on large language models (LLMs) and their role in natural language processing, multimodal reasoning, and autonomous decision-making. Highlighting tools such as ChatGPT, Claude, and Gemini, the discussion explores their applications in data analysis, model design, and optimization. The integration of advanced algorithms like neural networks, reinforcement learning, and generative models has enhanced the capabilities of AI systems to process, visualize, and interpret complex datasets. Additionally, the emergence of technologies like edge computing and automated machine learning (AutoML) democratizes access to AI, empowering users across skill levels to engage with intelligent systems. This work also underscores the importance of ethical considerations, transparency, and fairness in the deployment of AI technologies, paving the way for responsible innovation. Through practical insights into hardware configurations, software environments, and real-world applications, this article serves as a comprehensive resource for researchers and practitioners. By bridging theoretical underpinnings with actionable strategies, it showcases the potential of AI and LLMs to revolutionize big data management and drive meaningful advancements across domains such as healthcare, finance, and autonomous systems.
- North America > United States > California > San Francisco County > San Francisco (0.13)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
- (7 more...)
- Research Report (1.00)
- Overview (1.00)
Block MedCare: Advancing healthcare through blockchain integration with AI and IoT
Simonoski, Oliver, Bogatinoska, Dijana Capeska
This research explores the integration of blockchain technology in healthcare, focusing on enhancing the security and efficiency of Electronic Health Record (EHR) management. We propose a novel Ethereum-based system that empowers patients with secure control over their medical data. Our approach addresses key challenges in healthcare blockchain implementation, including scalability, privacy, and regulatory compliance. The system incorporates digital signatures, Role-Based Access Control, and a multi-layered architecture to ensure secure, controlled access. We developed a decentralized application (dApp) with user-friendly interfaces for patients, doctors, and administrators, demonstrating the practical application of our solution. A survey among healthcare professionals and IT experts revealed strong interest in blockchain adoption, while also highlighting concerns about integration costs. The study explores future enhancements, including integration with IoT devices and AI-driven analytics, contributing to the evolution of secure, efficient, and interoperable healthcare systems that leverage cutting-edge technologies for improved patient care.
- Europe > North Macedonia > Southwestern Statistical Region > Ohrid Municipality > Ohrid (0.05)
- Europe > Switzerland > Basel-City > Basel (0.04)
- Research Report > New Finding (0.46)
- Research Report > Promising Solution (0.46)
- Overview > Innovation (0.34)
- Information Technology > e-Commerce > Financial Technology (1.00)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence (1.00)
GREI Data Repository AI Taxonomy
Chodacki, John, Hanhel, Mark, Iacus, Stefano, Scherle, Ryan, Olson, Eric, Pfeiffer, Nici, Holmes, Kristi, Hosseini, Mohammad
Authors: John Chodacki (California Digital Library), Mark Hanhel (figshare), Stefano Iacus (Dataverse), Ryan Scherle (Dryad), Eric Olson (Center for Open Science), Nici Pfeiffer (Center for Open Science), Kristi Holmes (Zenodo), Mohammad Hosseini (Zenodo) The Generalist Repository Ecosystem Initiative (GREI) is a NIH-funded program where repositories collaborate in a "coopetition" model to enhance the work of generalist data repositories, which are critical infrastructure across research domains. As part of our commitment to this work, we recognize the evolving importance of artificial intelligence (AI) in the future of science and infrastructure. To help our community navigate AI-driven changes, we have developed the following taxonomy to illustrate the roles AI can play in managing data repositories, improving data quality, and increasing accessibility. Building on previously developed taxonomies and our coopetition efforts, the GREI repositories propose the following "GREI Data Repository AI Taxonomy," specifically tailored for data repository roles. Why do we need this?
- North America > United States > California (0.24)
- Europe (0.04)
- Health & Medicine (0.50)
- Government (0.48)
- Law (0.30)
Software Design Pattern Model and Data Structure Algorithm Abilities on Microservices Architecture Design in High-tech Enterprises
This study investigates the impact of software design model capabilities and data structure algorithm abilities on microservices architecture design within enterprises. Utilizing a qualitative methodology, the research involved in-depth interviews with software architects and developers who possess extensive experience in microservices implementation. The findings reveal that organizations emphasizing robust design models and efficient algorithms achieve superior scalability, performance, and flexibility in their microservices architecture. Notably, participants highlighted that a strong foundation in these areas facilitates better service decomposition, optimizes data processing, and enhances system responsiveness. Despite these insights, gaps remain regarding the integration of emerging technologies and the evolving nature of software design practices. This paper contributes to the existing literature by underscoring the critical role of these competencies in fostering effective microservices architectures and suggests avenues for future research to address identified gaps
- Overview (1.00)
- Research Report > New Finding (0.69)
Building Multi-Agent Copilot towards Autonomous Agricultural Data Management and Analysis
Pan, Yu, Sun, Jianxin, Yu, Hongfeng, Luck, Joe, Bai, Geng, Chamara, Nipuna, Ge, Yufeng, Awada, Tala
Current agricultural data management and analysis paradigms are to large extent traditional, in which data collecting, curating, integration, loading, storing, sharing and analyzing still involve too much human effort and know-how. The experts, researchers and the farm operators need to understand the data and the whole process of data management pipeline to make fully use of the data. The essential problem of the traditional paradigm is the lack of a layer of orchestrational intelligence which can understand, organize and coordinate the data processing utilities to maximize data management and analysis outcome. The emerging reasoning and tool mastering abilities of large language models (LLM) make it a potentially good fit to this position, which helps a shift from the traditional user-driven paradigm to AI-driven paradigm. In this paper, we propose and explore the idea of a LLM based copilot for autonomous agricultural data management and analysis. Based on our previously developed platform of Agricultural Data Management and Analytics (ADMA), we build a proof-of-concept multi-agent system called ADMA Copilot, which can understand user's intent, makes plans for data processing pipeline and accomplishes tasks automatically, in which three agents: a LLM based controller, an input formatter and an output formatter collaborate together. Different from existing LLM based solutions, by defining a meta-program graph, our work decouples control flow and data flow to enhance the predictability of the behaviour of the agents. Experiments demonstrates the intelligence, autonomy, efficacy, efficiency, extensibility, flexibility and privacy of our system. Comparison is also made between ours and existing systems to show the superiority and potential of our system.
- North America > United States > Nebraska > Lancaster County > Lincoln (0.14)
- North America > United States > North Carolina > Wake County > Raleigh (0.04)
- Information Technology > Security & Privacy (1.00)
- Food & Agriculture > Agriculture (1.00)
- Information Technology > Services (0.70)
- Information Technology > Information Management (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
User-centric Immersive Communications in 6G: A Data-oriented Approach via Digital Twin
Zhou, Conghao, Hu, Shisheng, Gao, Jie, Huang, Xinyu, Zhuang, Weihua, Shen, Xuemin
In this article, we present a novel user-centric service provision for immersive communications (IC) in 6G to deal with the uncertainty of individual user behaviors while satisfying unique requirements on the quality of multi-sensory experience. To this end, we propose a data-oriented approach for network resource management, featuring personalized data management that can support network modeling tailored to different user demands. Our approach leverages the digital twin (DT) technique as a key enabler. Particularly, a DT is established for each user, and the data attributes in the DT are customized based on the characteristics of the user. The DT functions, corresponding to various data operations, are customized in the development, evaluation, and update of network models to meet unique user demands. A trace-driven case study demonstrates the effectiveness of our approach in achieving user-centric IC and the significance of personalized data management in 6G.
- North America > Canada > Ontario > National Capital Region > Ottawa (0.14)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)